This report explores a data set containing financial contributions made by California residents to Presidential candidates in the 2016 Presidential election.
## 'data.frame': 1125659 obs. of 19 variables:
## $ cmte_id : Factor w/ 25 levels "C00458844","C00500587",..: 6 6 6 7 7 7 7 6 7 7 ...
## $ cand_id : Factor w/ 25 levels "P00003392","P20002671",..: 1 1 1 12 12 12 12 1 12 12 ...
## $ cand_nm : Factor w/ 25 levels "Bush, Jeb","Carson, Benjamin S.",..: 4 4 4 20 20 20 20 4 20 20 ...
## $ contbr_nm : Factor w/ 195943 levels "0, J.","AAGAARD, DAVID",..: 6546 26587 59263 99779 101025 101025 101084 78017 101106 101131 ...
## $ contbr_city : Factor w/ 2118 levels "",".","1000 OAKS",..: 937 256 603 255 1503 1503 2003 887 2047 1371 ...
## $ contbr_st : Factor w/ 1 level "CA": 1 1 1 1 1 1 1 1 1 1 ...
## $ contbr_zip : Factor w/ 134034 levels "","00000","000090272",..: 107295 66913 50011 61864 16081 16081 42612 54660 57711 108598 ...
## $ contbr_employer : Factor w/ 57983 levels "","-","--","---",..: 34229 34229 34229 4075 54897 54897 36555 34229 35407 44971 ...
## $ contbr_occupation: Factor w/ 25386 levels "","-","--","---",..: 18976 18976 18976 21213 15896 15896 17393 18976 14740 6604 ...
## $ contb_receipt_amt: num 50 200 5 40 35 100 25 40 10 15 ...
## $ contb_receipt_dt : Factor w/ 659 levels "01-APR-15","01-APR-16",..: 540 408 25 78 99 121 78 408 99 121 ...
## $ receipt_desc : Factor w/ 74 levels "","2016 SENATE PRIMARY DONOR REDESIGNATION FROM PRIMARY",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_cd : Factor w/ 2 levels "","X": 2 2 2 1 1 1 1 2 1 1 ...
## $ memo_text : Factor w/ 423 levels "","*","$0.02 REFUNDED ON 10/21/2016",..: 198 198 198 151 151 151 151 198 151 151 ...
## $ form_tp : Factor w/ 3 levels "SA17A","SA18",..: 2 2 2 1 1 1 1 2 1 1 ...
## $ file_num : int 1091718 1091718 1091718 1077404 1077404 1077404 1077404 1091718 1077404 1077404 ...
## $ tran_id : Factor w/ 1122205 levels "A000771210424405B8CF",..: 327338 326620 324002 858677 860121 862422 858139 326658 860117 863334 ...
## $ election_tp : Factor w/ 4 levels "","G2016","P2016",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ X : logi NA NA NA NA NA NA ...
## cmte_id cand_id cand_nm
## C00575795:547211 P00003392:547211 Clinton, Hillary Rodham :547211
## C00577130:407172 P60007168:407172 Sanders, Bernard :407172
## C00574624: 57820 P60006111: 57820 Cruz, Rafael Edward 'Ted': 57820
## C00580100: 50168 P80001571: 50168 Trump, Donald J. : 50168
## C00573519: 27362 P60005915: 27362 Carson, Benjamin S. : 27362
## C00458844: 14092 P60006723: 14092 Rubio, Marco : 14092
## (Other) : 21834 (Other) : 21834 (Other) : 21834
## contbr_nm contbr_city contbr_st
## MITCHELL, MARCIA : 388 LOS ANGELES : 88041 CA:1125659
## PETIT, MICHAEL : 352 SAN FRANCISCO: 78340
## CARROLL, TERI : 333 SAN DIEGO : 39967
## SAMATUA, DENISE : 332 OAKLAND : 28998
## SMITH, CHERYL : 324 SAN JOSE : 26420
## MONTANELLI, TERESA: 295 SACRAMENTO : 20651
## (Other) :1123635 (Other) :843242
## contbr_zip contbr_employer contbr_occupation
## 92660 : 449 N/A :148625 RETIRED :215855
## 92037 : 407 RETIRED :103846 NOT EMPLOYED :113153
## 900363146: 388 SELF-EMPLOYED: 94783 ATTORNEY : 30572
## 911075001: 352 NONE : 84250 TEACHER : 25486
## 926372766: 335 NOT EMPLOYED : 49114 INFORMATION REQUESTED: 20936
## 932631317: 333 (Other) :644474 (Other) :719524
## (Other) :1123395 NA's : 567 NA's : 133
## contb_receipt_amt contb_receipt_dt
## Min. :-10500.0 29-FEB-16: 11735
## 1st Qu.: 15.0 31-MAR-16: 11506
## Median : 27.0 31-MAY-16: 10435
## Mean : 121.8 30-APR-16: 9479
## 3rd Qu.: 97.0 26-SEP-16: 9237
## Max. : 10800.0 08-JUN-16: 8901
## (Other) :1064366
## receipt_desc memo_cd
## :1110614 :981391
## Refund : 8568 X:144268
## REDESIGNATION FROM PRIMARY : 1324
## REDESIGNATION TO GENERAL : 1324
## REATTRIBUTION / REDESIGNATION REQUESTED: 569
## REDESIGNATION TO CRUZ FOR SENATE : 544
## (Other) : 2716
## memo_text form_tp
## :624511 SA17A:979490
## * EARMARKED CONTRIBUTION: SEE BELOW:390588 SA18 :137601
## * HILLARY VICTORY FUND :100319 SB28A: 8568
## REDESIGNATION FROM PRIMARY : 1324
## REDESIGNATION TO GENERAL : 1324
## *BEST EFFORTS UPDATE : 1075
## (Other) : 6518
## file_num tran_id election_tp
## Min. :1003942 A5602AD777C8C4632B5A: 4 : 1425
## 1st Qu.:1077665 ADB49CB248C174E298F0: 4 G2016:313746
## Median :1091720 A26C35A6066754130B99: 3 P2016:810481
## Mean :1090286 A340DF85B7F884133A20: 3 P2020: 7
## 3rd Qu.:1104813 A4E50E2DD07E4475996F: 3
## Max. :1119833 A7C22FA389E0348F98F0: 3
## (Other) :1125639
## X
## Mode:logical
## NA's:1125659
##
##
##
##
##
The data set contains 1125659 donations and 19 variables. After looking at the variables, I have decided that most of them are not necessary to my analysis. As a result, I will remove these variables from my data frame and then continue on with my plots. The resulting data frame has the same number of observations but has been whittled down to only 4 variables.
## 'data.frame': 1125659 obs. of 4 variables:
## $ cand_nm : Factor w/ 25 levels "Bush, Jeb","Carson, Benjamin S.",..: 4 4 4 20 20 20 20 4 20 20 ...
## $ contbr_city : Factor w/ 2118 levels "",".","1000 OAKS",..: 937 256 603 255 1503 1503 2003 887 2047 1371 ...
## $ contbr_occupation: Factor w/ 25386 levels "","-","--","---",..: 18976 18976 18976 21213 15896 15896 17393 18976 14740 6604 ...
## $ contb_receipt_amt: num 50 200 5 40 35 100 25 40 10 15 ...
## cand_nm contbr_city
## Clinton, Hillary Rodham :547211 LOS ANGELES : 88041
## Sanders, Bernard :407172 SAN FRANCISCO: 78340
## Cruz, Rafael Edward 'Ted': 57820 SAN DIEGO : 39967
## Trump, Donald J. : 50168 OAKLAND : 28998
## Carson, Benjamin S. : 27362 SAN JOSE : 26420
## Rubio, Marco : 14092 SACRAMENTO : 20651
## (Other) : 21834 (Other) :843242
## contbr_occupation contb_receipt_amt
## RETIRED :215855 Min. :-10500.0
## NOT EMPLOYED :113153 1st Qu.: 15.0
## ATTORNEY : 30572 Median : 27.0
## TEACHER : 25486 Mean : 121.8
## INFORMATION REQUESTED: 20936 3rd Qu.: 97.0
## (Other) :719524 Max. : 10800.0
## NA's : 133
First I decided to look at the candidates and how many times they were donated to.
| cand_nm | count | sum | meadian | percent_count | percent_sum |
|---|---|---|---|---|---|
| Bush, Jeb | 3130 | 3300291.83 | 500 | 0.28 | 2.41 |
| Carson, Benjamin S. | 27362 | 2924593.00 | 50 | 2.43 | 2.13 |
| Christie, Christopher J. | 333 | 456066.00 | 1000 | 0.03 | 0.33 |
| Clinton, Hillary Rodham | 547211 | 83781357.32 | 25 | 48.61 | 61.10 |
| Cruz, Rafael Edward ‘Ted’ | 57820 | 5735382.27 | 50 | 5.14 | 4.18 |
| Fiorina, Carly | 4696 | 1468489.42 | 100 | 0.42 | 1.07 |
| Gilmore, James S III | 3 | 8100.00 | 2700 | 0.00 | 0.01 |
| Graham, Lindsey O. | 347 | 414495.00 | 1000 | 0.03 | 0.30 |
| Huckabee, Mike | 531 | 230890.60 | 50 | 0.05 | 0.17 |
| Jindal, Bobby | 31 | 23231.26 | 250 | 0.00 | 0.02 |
| cand_nm | percent_count |
|---|---|
| Clinton, Hillary Rodham | 48.61 |
| Sanders, Bernard | 36.17 |
| Cruz, Rafael Edward ‘Ted’ | 5.14 |
| Trump, Donald J. | 4.46 |
| Carson, Benjamin S. | 2.43 |
| Rubio, Marco | 1.25 |
| Fiorina, Carly | 0.42 |
| Paul, Rand | 0.38 |
| Bush, Jeb | 0.28 |
| Kasich, John R. | 0.27 |
As can be seen above, Hillary Clinton and Bernie Sanders received the most donations by far (getting about 49% and 36% of the total donations respectively). This doesn’t tell us anything about the kind of donations they are receiving, so I will look at that again later in the Bivariate Analysis.
Next I wanted to look at counts for political parties, but since there wasn’t a variable for that I had to make one. In order to do that, I wrote a function to match each candidate to there political party and then created the graph shown below.
| cand_party | count | sum | meadian | percent_count | percent_sum |
|---|---|---|---|---|---|
| Democrat | 955258 | 103970224.8 | 27 | 84.86 | 75.82 |
| Republican | 166747 | 32420371.5 | 50 | 14.81 | 23.64 |
| Libertarian | 1591 | 461430.6 | 100 | 0.14 | 0.34 |
| Green | 1907 | 245490.5 | 50 | 0.17 | 0.18 |
| Independent | 156 | 35135.5 | 100 | 0.01 | 0.03 |
| cand_party | percent_count |
|---|---|
| Democrat | 84.86 |
| Republican | 14.81 |
| Green | 0.17 |
| Libertarian | 0.14 |
| Independent | 0.01 |
Even though I saw this coming from the graph above for candidates, Democrats received the most donations overall taking in just under 85% of the total donations.
| contb_receipt_amt | count | sum | meadian | percent_count | percent_sum |
|---|---|---|---|---|---|
| -10500 | 1 | -10500 | -10500 | 0 | -0.01 |
| -10000 | 1 | -10000 | -10000 | 0 | -0.01 |
| -8460 | 1 | -8460 | -8460 | 0 | -0.01 |
| -8300 | 1 | -8300 | -8300 | 0 | -0.01 |
| -8100 | 1 | -8100 | -8100 | 0 | -0.01 |
| -5825 | 1 | -5825 | -5825 | 0 | 0.00 |
| contb_receipt_amt | percent_count |
|---|---|
| 25 | 13.74 |
| 50 | 12.40 |
| 100 | 11.25 |
| 10 | 9.04 |
| 5 | 6.44 |
| 27 | 5.39 |
| 15 | 4.90 |
| 250 | 3.75 |
| 19 | 2.07 |
| 35 | 2.07 |
The top three donation amounts are $25 and $50, and $100. Within the top 10, most were small donations under $100 dollars with the exception of $250 for some reason. Also, most donation amounts are divisible by five with the exceptions of $27 and $19. I wonder why those amounts was so numerous.
| contbr_city | count | sum | median | lat | lon | percent_count | percent_sum |
|---|---|---|---|---|---|---|---|
| 25 | 5725.00 | 100.0 | 36.77826 | -119.4179 | 0.00 | 0.00 | |
| . | 1 | 2.40 | 2.4 | 36.77826 | -119.4179 | 0.00 | 0.00 |
| 1000 OAKS | 1 | 100.00 | 100.0 | 34.17056 | -118.8376 | 0.00 | 0.00 |
| 29 PALMS | 100 | 2990.52 | 27.0 | 34.13556 | -116.0542 | 0.01 | 0.00 |
| -4086 | 1 | 40.00 | 40.0 | 37.38580 | -121.9731 | 0.00 | 0.00 |
| 90620BUENA PARK | 1 | 250.00 | 250.0 | 33.84287 | -118.0128 | 0.00 | 0.00 |
| 91352 | 1 | 250.00 | 250.0 | 34.23016 | -118.3520 | 0.00 | 0.00 |
| 91355 | 1 | 28.00 | 28.0 | 34.44003 | -118.5915 | 0.00 | 0.00 |
| 93271THREE RIVERS | 2 | 500.00 | 250.0 | 36.43884 | -118.9045 | 0.00 | 0.00 |
| ACAMPO | 72 | 9991.93 | 50.0 | 38.17464 | -121.2786 | 0.01 | 0.01 |
A test mapping I performed revealed some locations out side of California, so I decided to have a look at where all of the points fall by mapping them on a world map as well as a California map.
After mapping all the points on the world map, all of the points outside of California can be clearly seen. I’m assuming the donations that came from outside California came from California residents who are living out of state, but further research into who made the donations would need to be made in order to know for sure.
Looking at the map of California only, There is a decent scattering of locations all over the state, but there is definitely a clustering of donations coming from the areas surrounding San Francisco, Los Angeles, and San Diego.
Let’s zoom in on those dense areas to get a better idea of the layout in those areas.
| contbr_city | percent_count |
|---|---|
| LOS ANGELES | 7.82 |
| SAN FRANCISCO | 6.96 |
| SAN DIEGO | 3.55 |
| OAKLAND | 2.58 |
| SAN JOSE | 2.35 |
| SACRAMENTO | 1.83 |
| BERKELEY | 1.82 |
| LONG BEACH | 1.20 |
| SANTA MONICA | 1.11 |
| PASADENA | 0.98 |
Zooming in on these areas shows that the highest counts in these areas, and all of California, seem to be San Francisco, Los Angeles, and San Diego. Taking a look at the table of cities by percent count confirms this. This isn’t unexpected, as they are the three largest cities in California.
After location based on cities, I moved on to explore how occupation affected donations. However, as the graph shows below, the x-axis is overcrowded and some cleaning is needed in order to make the plot readable.
To clean the data I used information from the United States Census Bureau that used the North American Industry Classification System. Using that chart I wrote a function to group the various occupations into industry groups. However, the data is not exhaustive. There are 25,387 unique occupations listed in the data set, and trying sort all of them into their respective categories would have taken an inordinate amount of time. As a result I only included the top 100 occupations by count.
| contbr_occup_categ | count | sum | meadian | percent_count | percent_sum |
|---|---|---|---|---|---|
| Retired | 215855 | 21938582.5 | 30.00 | 31.72 | 26.02 |
| Professional, Scientific, and Technical Services | 132483 | 20609591.9 | 38.00 | 19.47 | 24.45 |
| Health Care and Social Assistance | 45502 | 4599496.3 | 27.00 | 6.69 | 5.46 |
| Administrative and Support and Waste Management and Remediation Services | 14042 | 1173176.8 | 25.00 | 2.06 | 1.39 |
| Unemployed | 115855 | 6414819.2 | 27.00 | 17.02 | 7.61 |
| Educational Services | 48236 | 3643095.4 | 25.00 | 7.09 | 4.32 |
| Arts, Entertainment, and Recreation | 32978 | 4741888.6 | 27.00 | 4.85 | 5.62 |
| Management of Companies and Enterprises | 34821 | 11130702.3 | 50.00 | 5.12 | 13.20 |
| Student | 7531 | 914671.8 | 20.16 | 1.11 | 1.08 |
| Real Estate and Rental and Leasing | 9668 | 2127758.5 | 40.00 | 1.42 | 2.52 |
| contbr_occup_categ | percent_count |
|---|---|
| Retired | 31.72 |
| Professional, Scientific, and Technical Services | 19.47 |
| Unemployed | 17.02 |
| Educational Services | 7.09 |
| Health Care and Social Assistance | 6.69 |
| Management of Companies and Enterprises | 5.12 |
| Arts, Entertainment, and Recreation | 4.85 |
| Homemaker | 2.14 |
| Administrative and Support and Waste Management and | |
| Remediation Services 2.06 | |
| Real Estate and Rental and Leasing | 1.42 |
| Student | 1.11 |
| Finance and Insurance | 0.49 |
| Transportation and Warehousing | 0.36 |
| Information | 0.25 |
| Agriculture, Forestry, Fishing and Hunting | 0.23 |
Looking at the graph it’s easy to see that retirees were by far the most active contributors, followed by people in the “Professional, Scientific, and Technical Services” industry, but the most surprising was “Unemployed” people coming in at 3rd place (not that far behind 2nd place actually). I would assume unemployed people wouldn’t have the extra cash lying around to make donations, however the data seems to show otherwise.
There are 1,125,659 different donations made with 19 different variables. Most of the 19 variables, however, were unimportant to my analysis. As a result, I dropped most of the variables. This left with with only 4 variables to work with at the start (cand_nm, contbr_city,contbr_occupation, and contb_receipt_amt).
The main features of interest are donations in terms of the number and amount in dollars, the candidates, the contributors in terms of their occupation, and the location of donations (cand_nm, contb_receipt_amt, contbr_occupation, contbr_city). I want to see how they all interact with each other and which features influenced the count and mount of donations candidates received.
Yes, I created a variable for political parties (cand_party) and for occupation categories (contbr_occup_categ). To reiterate what I stated above with the plots, for political parties I wrote a function to match each candidate to there political party. For occupational categories I cleaned the data by using information from the United States Census Bureau that used the North American Industry Classification System. Using that chart I wrote a function to group the various occupations into industry groups. However, the data is not exhaustive. There are 25,387 unique occupations listed in the data set, and trying sort all of them into their respective categories would have taken an inordinate amount of time. As a result I only included the top 100 occupations that were listed in the summary for “contbr_occupation”by grouping the contributors occupations by major industry.
I created a variety of new tables for each variable to get counts of the donations, sums of the donations, median donations, and percentages for counts and sums. I created these table for the purposes of getting these extra statistics on the variables and to help with making certain plots (particularly for plotting cities on a map to show the geographical distribution of the donations).
| cand_nm | percent_sum |
|---|---|
| Clinton, Hillary Rodham | 61.10 |
| Sanders, Bernard | 14.31 |
| Trump, Donald J. | 7.24 |
| Cruz, Rafael Edward ‘Ted’ | 4.18 |
| Rubio, Marco | 3.53 |
| Bush, Jeb | 2.41 |
| Carson, Benjamin S. | 2.13 |
| Kasich, John R. | 1.11 |
| Fiorina, Carly | 1.07 |
| Paul, Rand | 0.58 |
It’s a bit hard to see all of the IQR ranges, so let’s zoom in a bit to get a better look.
When it comes to total donations in dollars, Hillary Clinton is the clear winner with about 61% of donations going to her.
There seems to be quit a variety of IQRs between all the candidates. Interestingly, most candidates have median donations at $25, and $50, and $100 (which are the three most common donation amounts). Also, most have relatively long IQRs, and a few have quit long IQRs and high median donation amounts. A full breakdown of the numbers are below.
## PCF$cand_nm: Bush, Jeb
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400 50 500 1054 2700 10000
## --------------------------------------------------------
## PCF$cand_nm: Carson, Benjamin S.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -10000.0 25.0 50.0 106.9 100.0 10000.0
## --------------------------------------------------------
## PCF$cand_nm: Christie, Christopher J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700 100 1000 1370 2700 5400
## --------------------------------------------------------
## PCF$cand_nm: Clinton, Hillary Rodham
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400.0 15.0 25.0 153.1 100.0 10000.0
## --------------------------------------------------------
## PCF$cand_nm: Cruz, Rafael Edward 'Ted'
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -8300.00 25.00 50.00 99.19 100.00 10800.00
## --------------------------------------------------------
## PCF$cand_nm: Fiorina, Carly
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3300.0 25.0 100.0 312.7 250.0 5400.0
## --------------------------------------------------------
## PCF$cand_nm: Gilmore, James S III
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2700 2700 2700 2700 2700 2700
## --------------------------------------------------------
## PCF$cand_nm: Graham, Lindsey O.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700 100 1000 1195 2700 8100
## --------------------------------------------------------
## PCF$cand_nm: Huckabee, Mike
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.0 25.0 50.0 434.8 500.0 5400.0
## --------------------------------------------------------
## PCF$cand_nm: Jindal, Bobby
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.1 250.0 250.0 749.4 1000.0 2700.0
## --------------------------------------------------------
## PCF$cand_nm: Johnson, Gary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -607.9 50.0 100.0 290.0 250.0 2742.0
## --------------------------------------------------------
## PCF$cand_nm: Kasich, John R.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400.0 50.0 100.0 505.7 500.0 2700.0
## --------------------------------------------------------
## PCF$cand_nm: Lessig, Lawrence
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.0 50.0 250.0 500.4 500.0 2700.0
## --------------------------------------------------------
## PCF$cand_nm: McMullin, Evan
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -500.0 25.0 100.0 225.2 250.0 2700.0
## --------------------------------------------------------
## PCF$cand_nm: O'Malley, Martin Joseph
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.0 50.0 250.0 750.2 1000.0 5400.0
## --------------------------------------------------------
## PCF$cand_nm: Pataki, George E.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 100 500 1000 1522 2700 2700
## --------------------------------------------------------
## PCF$cand_nm: Paul, Rand
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400 25 50 187 100 5400
## --------------------------------------------------------
## PCF$cand_nm: Perry, James R. (Rick)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700 1000 2700 1797 2700 2700
## --------------------------------------------------------
## PCF$cand_nm: Rubio, Marco
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400.0 25.0 75.0 343.6 250.0 5400.0
## --------------------------------------------------------
## PCF$cand_nm: Sanders, Bernard
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -10500.00 15.00 27.00 48.21 50.00 10000.00
## --------------------------------------------------------
## PCF$cand_nm: Santorum, Richard J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.0 25.0 65.0 413.7 262.5 2700.0
## --------------------------------------------------------
## PCF$cand_nm: Stein, Jill
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -300.0 29.0 50.0 128.7 100.0 2700.0
## --------------------------------------------------------
## PCF$cand_nm: Trump, Donald J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3716.0 28.0 80.0 197.8 200.0 5400.0
## --------------------------------------------------------
## PCF$cand_nm: Walker, Scott
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400.0 100.0 250.0 676.1 1000.0 10800.0
## --------------------------------------------------------
## PCF$cand_nm: Webb, James Henry Jr.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.0 100.0 275.0 722.3 875.0 5400.0
Looking at donation amounts by candidates, while Hillary Clinton and Bernie Sanders were again the top two, they both had one of the lowest median donations at $25 and $27 respectively. This helps solve the mystery of $27 being one of the most common donation amount. Since Bernie had the second most donations by count, it makes sense that his median donation would be one of the most common. Why $27 was such a popular amount to give Bernie remains a mystery however.
| cand_party | percent_sum |
|---|---|
| Democrat | 75.82 |
| Republican | 23.64 |
| Libertarian | 0.34 |
| Green | 0.18 |
| Independent | 0.03 |
We can see that Democrats and Republicans received some of the highest donation amounts, but we can’t really make out the IQR ranges. Let’s zoom in to get a closer look.
## PCF$cand_party: Democrat
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -10500.0 15.0 27.0 108.8 67.0 10000.0
## --------------------------------------------------------
## PCF$cand_party: Republican
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -10000.0 25.0 50.0 194.4 100.0 10800.0
## --------------------------------------------------------
## PCF$cand_party: Libertarian
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -607.9 50.0 100.0 290.0 250.0 2742.0
## --------------------------------------------------------
## PCF$cand_party: Green
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -300.0 29.0 50.0 128.7 100.0 2700.0
## --------------------------------------------------------
## PCF$cand_party: Independent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -500.0 25.0 100.0 225.2 250.0 2700.0
The results for political party are definitely not a surprise given how much more money Hillary and Bernie received individually (especially Hillary), but they did have a smaller percentage of total donations in dollars than by count. However, while Democrats gave the most overall they seem to have the lowest median donation out of all other parties. Does this mean that Democrats tend to be poorer? That they have less money to spare than the others? More information would be needed to answer these questions. Also, the IQRs of the other parties tend to come at more regular amounts like $25, $50, and $100. Perhaps the Democrats were just more likely to use the “Other” amount box instead of choosing the preset suggested donation amounts that are often given.
Regardless, in order to find out if there is a statistically significant difference between the political parties, I will conduct a Kruskal-Wallis rank sum test.
##
## Kruskal-Wallis rank sum test
##
## data: contb_receipt_amt by cand_party
## Kruskal-Wallis chi-squared = 35874, df = 4, p-value < 2.2e-16
The results above show that, with a p-value well below 0.05, there is indeed a statistical significance between the political parties and the donation amounts received.
However, this doesn’t tell us much about which parties differ from each other, so I will run a post-hoc analysis to determine this.
| Group | Letter | MonoLetter |
|---|---|---|
| Democrat | a | a |
| Republican | b | b |
| Libertarian | c | c |
| Green | d | d |
| Independent | cd | cd |
After running the post-hoc analysis using the Dunn Test, it is easy to see that almost all of the parties are significantly different from each other with the exception of Independents having no statistical significance from both Libertarians and Greens.
| contbr_city | percent_sum |
|---|---|
| LOS ANGELES | 10.95 |
| SAN FRANCISCO | 10.19 |
| SAN DIEGO | 2.48 |
| PALO ALTO | 2.20 |
| BEVERLY HILLS | 2.17 |
| OAKLAND | 2.06 |
| SANTA MONICA | 1.94 |
| BERKELEY | 1.90 |
| SACRAMENTO | 1.56 |
| SAN JOSE | 1.55 |
In general, these maps based on location and contribution totals in dollars mirror the maps from earlier based on location and contribution totals by count. However, there are some exceptions. When looking at the raw numbers in the table, Beverly Hills and Palo Alto have moved up on the list for total contributions in dollars, I assume, because of these areas affluence.
Like I did above with political parties, in order to find out if there is a statistically significant difference between the political parties, I will conduct a Kruskal-Wallis rank sum test.
##
## Kruskal-Wallis rank sum test
##
## data: contb_receipt_amt by contbr_city
## Kruskal-Wallis chi-squared = 44295, df = 2117, p-value < 2.2e-16
The results above show that, with a p-value well below 0.05, there is indeed a statistical significance between cities and the donation amounts given.
| contbr_occup_categ | percent_sum |
|---|---|
| Retired | 26.02 |
| Professional, Scientific, and Technical Services | 24.45 |
| Management of Companies and Enterprises | 13.20 |
| Unemployed | 7.61 |
| Arts, Entertainment, and Recreation | 5.62 |
| Health Care and Social Assistance | 5.46 |
| Homemaker | 5.39 |
| Educational Services | 4.32 |
| Real Estate and Rental and Leasing | 2.52 |
| Finance and Insurance | 2.16 |
| Administrative and Support and Waste Management and | |
| Remediation Services 1.39 | |
| Student | 1.08 |
| Agriculture, Forestry, Fishing and Hunting | 0.51 |
| Information | 0.15 |
| Transportation and Warehousing | 0.11 |
The only obvious changes between contribution amounts by occupation and contribution count by occupation are with increases in seven of the categories (with the biggest in the “Professional, Scientific, and Technical Services” and “Management of Companies and Enterprises” categories) and decreases three categories (with the biggest being the “Unemployed” category). Retirees, and “Professional, Scientific, and Technical Services” and “Management of Companies and Enterprises” gave the most as groups.
##
## Kruskal-Wallis rank sum test
##
## data: contb_receipt_amt by contbr_occup_categ
## Kruskal-Wallis chi-squared = 17346, df = 14, p-value < 2.2e-16
The results above show that, with a p-value well below 0.05, there is indeed a statistical significance between cities and the donation amounts given.
| Group | Letter | MonoLetter |
|---|---|---|
| Retired | a | a |
| Professional,Scientific,andTechnicalServices | b | b |
| HealthCareandSocialAssistance | a | a |
| AdministrativeandSupportandWasteManagementand | ||
| RemediationServices c c | ||
| Unemployed | c | c |
| EducationalServices | d | d |
| Arts,Entertainment,andRecreation | e | e |
| ManagementofCompaniesandEnterprises | f | f |
| Student | g | g |
| RealEstateandRentalandLeasing | h | h |
| Homemaker | h | h |
| TransportationandWarehousing | i | i |
| FinanceandInsurance | j | j |
| Information | e | e |
| Agriculture,Forestry,FishingandHunting | k | k |
When looking at the individual candidates there doesn’t seem to be any general relationship at all, but once grouped into political parties there seems to be relationship between belonging to a party (especially the two major parties), and the amount of donations in dollars. Both the major parties (Democrats and Republicans) had the highest donation amounts, with Democrats being the clear winner. However, even though Democrats had the highest donation total overall, they also had the lowest mean donation amount at $27. Not only did the most people donate the most money to Democrats, but they did so at relatively modest amounts per individual.
In addition, there seems to be a relationship between cities and the amount of donations given, with big cities (specifically San Francisco, Los Angeles, and San Diego) and the immediate surrounding area giving a higher amount of money than the countryside.
Finally, looking at donations by occupation showed that a few groups (specifically retirees, “Professional, Scientific, and Technical Services” and “Management of Companies and Enterprises) gave in amounts well above all the other groups.
From what I can see, no single relationship was stronger than the others. After running a Kruskal-Wallis rank sum test on all of the relationships, all three variables (political parties, location by city, and occupation) showed strong relationships to donations (with p-values < 2.2e-16) with certain political parties, cities, and occupations receiving and giving far more donations (in both count and sums) than the others.
Hmm, this graph seems a bit convoluted with all 25 candidates and 15 occupation categories together in one graph. Probably best to take a look at this by political party instead.
| contbr_occup_categ | cand_nm | cand_party | count | sum | meadian |
|---|---|---|---|---|---|
| Retired | Bush, Jeb | Republican | 996 | 451364.00 | 50 |
| Retired | Carson, Benjamin S. | Republican | 13692 | 1168789.56 | 50 |
| Retired | Christie, Christopher J. | Republican | 38 | 29065.00 | 100 |
| Retired | Clinton, Hillary Rodham | Democrat | 129408 | 12342944.22 | 25 |
| Retired | Cruz, Rafael Edward ‘Ted’ | Republican | 23195 | 1715291.66 | 50 |
| Retired | Fiorina, Carly | Republican | 1971 | 343295.47 | 50 |
| Retired | Graham, Lindsey O. | Republican | 80 | 61645.00 | 250 |
| Retired | Huckabee, Mike | Republican | 206 | 49290.50 | 50 |
| Retired | Jindal, Bobby | Republican | 5 | 2250.00 | 250 |
| Retired | Johnson, Gary | Libertarian | 233 | 53835.55 | 100 |
With Democrats getting so much more money than everyone else, it’s hard to see the distributions of the rest of the parties. Let’s zoom in to get a better look.
With all the high value outliers it’s hard to see the IQRs of all the parties. Let’s zoom in to get a better look.
For these plots I decided to look at political parties and how much money was given to each party by occupation. I decided to look at donations in two different ways, first by the sum donations and second by the individual donation amounts. From the plots you can see that Democrat’s received more donations by sum in every occupation except “Agriculture, Forestry, Fishing and Hunting” (Republicans received more). In terms of donation amounts, Democrats had the most consistent median value across all occupations. The further right on the plot of political parties the more the median values vary (with Independents seeming to vary the most widely). Perhaps this is due to the number of people who donated being lower for these parties.
| contbr_city | cand_party | count | sum | median | lat | lon |
|---|---|---|---|---|---|---|
| Democrat | 25 | 5725.00 | 100.0 | 36.77826 | -119.4179 | |
| . | Republican | 1 | 2.40 | 2.4 | 36.77826 | -119.4179 |
| 1000 OAKS | Republican | 1 | 100.00 | 100.0 | 34.17056 | -118.8376 |
| 29 PALMS | Democrat | 97 | 2900.52 | 27.0 | 34.13556 | -116.0542 |
| 29 PALMS | Republican | 3 | 90.00 | 30.0 | 34.13556 | -116.0542 |
| -4086 | Republican | 1 | 40.00 | 40.0 | 37.38580 | -121.9731 |
| 90620BUENA PARK | Republican | 1 | 250.00 | 250.0 | 33.84287 | -118.0128 |
| 91352 | Republican | 1 | 250.00 | 250.0 | 34.23016 | -118.3520 |
| 91355 | Republican | 1 | 28.00 | 28.0 | 34.44003 | -118.5915 |
| 93271THREE RIVERS | Republican | 2 | 500.00 | 250.0 | 36.43884 | -118.9045 |
Looking at these maps, it seems that while some towns where Democratic donors dominated can be seen spread throughout California, it seems like a majority of them are concentrated in and around major cities. The other parties (Republicans especially) are much more spread out across California. In the cities, Democrats had a clear dominance in terms of the amount of money donated.
| contbr_city | contbr_occup_categ | count | sum | median | lat | lon |
|---|---|---|---|---|---|---|
| Retired | 12 | 2750.00 | 250.0 | 36.77826 | -119.4179 | |
| Unemployed | 10 | 70.00 | 5.0 | 36.77826 | -119.4179 | |
| Management of Companies and Enterprises | 1 | 2700.00 | 2700.0 | 36.77826 | -119.4179 | |
| NA | 2 | 205.00 | 102.5 | 36.77826 | -119.4179 | |
| . | NA | 1 | 2.40 | 2.4 | 36.77826 | -119.4179 |
| 1000 OAKS | Transportation and Warehousing | 1 | 100.00 | 100.0 | 34.17056 | -118.8376 |
| 29 PALMS | Unemployed | 4 | 95.00 | 25.0 | 34.13556 | -116.0542 |
| 29 PALMS | NA | 96 | 2895.52 | 27.0 | 34.13556 | -116.0542 |
| -4086 | Retired | 1 | 40.00 | 40.0 | 37.38580 | -121.9731 |
| 90620BUENA PARK | NA | 1 | 250.00 | 250.0 | 33.84287 | -118.0128 |
From what I can see, it seems that there are a wider variety of jobs clustered around the cities. The further out into the countryside you go, “Educational Services” and “Agriculture, Forestry, Fishing and Hunting” jobs seem to dominate in terms of donations.
Democrat’s received more donations by sum in every occupation except “Agriculture, Forestry, Fishing and Hunting” (Republicans received more). In terms of donation amounts, Democrats had the most consistent median value across all occupations at around $27. The further left on the plot the political party the more the median values vary (with Independents seeming to vary the most widely). Perhaps this is due to the number of people who donated being lower for these parties.
Democratic donors are less spread out and are concentrated in and around major cities. The other parties (Republicans especially) are much more spread out across California. In the cities, Democrats had a clear dominance in terms of the amount of money donated.
It can be hard to distinguish between some of the colors for the occupations, but certain occupations seem to be clustered in various regions, such as a clustering of the occupations “Student”, “Real Estate and Rental and Leasing”, and “Homemaker” in and around Los Angeles (likely due to a high volume of universities, real estate opportunities, and and families in this area), and a clustering of the occupation category “Agriculture, Forestry, Fishing, and Hunting” down the center of California (which is where a lot of agriculture is done in the Central Valley).
Looking at the total donations for political parties, the plot shows that Democrats received the most donations by far than any other political party.
Median donations are fairly uniform across occupations with most being in between $25 and $50.
These plots show the distribution of donations made to the various political parties by city. While Democrat’s may have given more money overall, most of that came from larger cities. The other parties (Republicans especially) are much more spread out.
This report explores a data set containing financial contributions made by California residents to Presidential candidates in the 2016 Presidential election. While the data set contained 19 variables I whittled down to only the 4 variables I was I thought were useful to analyze.
In general, this analysis seems to confirm a lot of my preconceived notions about the political landscape in California. After running the analysis, the numbers show that California residents overall, and especially those in bigger cities, lean Liberal and vote for and support Democrats. The Democratic Party beat the other parties in all the aspects I looked at. Overall, they had the most people donate to them and the most money donated to them, and when broken down by occupation they won the support of every occupational category across the board in both count and dollar amount (with the exception of the “Agriculture, Forestry, Fishing, and Hunting” category). With all the surprises in this election, however, it would be interesting to run another analysis on past elections and see how well the trends from this election hold.
There are some limitations in my analysis when it comes to the my analysis of occupations. As I mentioned briefly earlier in my analysis, there are 25,387 unique occupations listed in the data set, and trying sort all of them into their respective categories would have taken an inordinate amount of time. As a result I only included the top 100 occupations. If I had included all of the occupations, the numbers might be a bit different. Perhaps the remaining donations would have gone to Republicans of other candidates and the Democrats might not have had quit the same dominance across the board. Then again, perhaps nothing significant would have happened. However, because of this limitation the results for donations by occupation should be taken with a grain of salt.